Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

نویسندگان

Aditya Gopalan

Prashanth L. A.

Michael C. Fu

Steven I. Marcus

چکیده

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic K-armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate cost distortions, and exhibit sublinear regret assuming Hölder continuous weight distortion functions. For the K-armed setting, we show that the algorithm, called W-UCB, achieves problem-dependent regret O ( LM logn/∆ 2 α −1 ) , where n is the number of plays, ∆ is the gap in distorted expected value between the best and next best arm, L and α are the Hölder constants for the distortion function, and M is an upper bound on costs, and a problem-independent regret bound of O((KL2M2)α/2n(2−α)/2). We also present a matching lower bound on the regret, showing that the regret of W-UCB is essentially unimprovable over the class of Hölder -continuous weight distortions. For the linearly parameterized setting, we develop a new algorithm, a variant of the Optimism in the Face of Uncertainty Linear bandit (OFUL) algorithm Abbasi-Yadkori et al. [2011] called WOFUL (Weight-distorted OFUL), and show that it has regret O(d √ n polylog(n)) with high probability, for sub-Gaussian cost distributions. Finally, numerical examples demonstrate the advantages resulting from using distortion-aware learning algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modal Bandits

Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.

متن کامل

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...

متن کامل

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of t...

متن کامل

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is O ( K √ n log n ) and against stochastic bandits the pseudo-regret is O ( ∑ i(log n)/∆i). We also show that no algorithm with O (log n) pseudo-regret against stochastic bandits can achieve Õ ( √ n) expected regret against adaptive...

متن کامل

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

نویسندگان

چکیده

منابع مشابه

Modal Bandits

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

عنوان ژورنال:

اشتراک گذاری